Skip to content

fix(inference_provider): normalize provider endpoint errors#1797

Open
rodboev wants to merge 5 commits into
NVIDIA-NeMo:mainfrom
rodboev:pr/inference-provider-error-normalization
Open

fix(inference_provider): normalize provider endpoint errors#1797
rodboev wants to merge 5 commits into
NVIDIA-NeMo:mainfrom
rodboev:pr/inference-provider-error-normalization

Conversation

@rodboev

@rodboev rodboev commented Jun 27, 2026

Copy link
Copy Markdown

Summary

Normalize provider endpoint failures in responses_api_models/inference_provider so /v1/chat/completions and /v1/responses return structured provider-aware errors instead of falling through to the generic inner-server 500 path.

Closes #1748

Background

InferenceProvider.chat_completions() currently lets aiohttp.ClientResponseError escape directly from create_chat_completion(). Shared middleware then collapses that into the same generic JSON 500 string for auth failures, rate limits, missing models, and upstream 5xx responses. As more OpenAI-compatible providers land, that hides whether a failure is retryable and strips the provider context callers need to debug the endpoint.

Changes

  • catch ClientResponseError locally in responses_api_models/inference_provider/app.py and raise a structured HTTPException before shared middleware flattens the failure
  • normalize the surfaced payload to include provider_status, retryable, provider_context, model, category, and a concise provider-derived message
  • classify provider-neutral categories for authentication, request errors, model-not-found, rate limits, transient upstream failures, and fallback provider errors
  • add focused endpoint tests that cover structured auth failures, request errors, retry-exhausted 500 failures through the shared retry path, status-zero fallback, and the /v1/responses converted path
  • add helper coverage for retryable-status normalization and plain-text provider bodies without over-claiming route-level retry behavior
  • add direct helper coverage for top-level message and detail extraction, raw JSON fallback, string and None response_content, truncation, and the message-derived classification branches

Out of scope

  • no retry policy changes
  • no edits to nemo_gym/openai_utils.py or shared middleware
  • no docs, config, or dependency updates

Validation

  • Not run locally: uv run pytest responses_api_models/inference_provider/tests/test_app.py -x
  • Not run locally: uv run pre-commit run --files responses_api_models/inference_provider/app.py responses_api_models/inference_provider/tests/test_app.py
  • Passed locally: ruff check --config pyproject.toml responses_api_models/inference_provider/app.py responses_api_models/inference_provider/tests/test_app.py
  • Passed locally: ruff format --config pyproject.toml --check responses_api_models/inference_provider/app.py responses_api_models/inference_provider/tests/test_app.py

Notes

Native Windows uv run still fails in this repo because uvloop cannot build on Windows during dependency resolution. That is the same known environment limitation called out in the local repo config, so the focused pytest and pre-commit commands remain unchecked here and CI is the authoritative proof surface for the behavioral tests added in this slice.

rodboev added 5 commits June 27, 2026 00:08
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@nemo-automation-bot nemo-automation-bot Bot added the community-request Issue reported or requested by someone from the community label Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request Issue reported or requested by someone from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provider-aware endpoint error handling in inference_provider model server

1 participant